The Geodemographics of Online Gambling Behaviours in Great Britain

Shunya Kimura, Justin van Dijk and Paul Longley

2025-03-18

Conflict of Interest

  • Co-funded by: ESRC UBEL DTP and GambleAware

  • Data provider: one of the “Big Five” British gambling operators

    • Independent research: no involvement in research design and no contractual obligations were in place

    • Disclosure risk control: data anonymised and no revenue indicators shown

Contents

  • Research Context

  • Data Access & Feature Curation

  • Clustering

  • Case Study

  • Novelty & Implications

Research Context

Online Gambling in Great Britain

  • Online gambling generates 44.2% of total gambling revenue (+6.9% from previous year)1

  • Gambling harm affects the health and well-being of individuals and those around them

  • Approximately 0.4%2 to 2.5%3 of adults experience ‘problem gambling’

Rationale

  • Survey data

    • Gambling Survey for Great Britain (GSGB); Health Survey for England (HSE); GambleAware Annual GB Treatment and Support Surveys

    • Measurement error: self-reported → social desirability bias / recall errors

    • Selection bias: online survey attracts certain population

    • High cost → small sample size → small sampling fraction

  • Smart data

    • Large sample size; spatially & temporally granular; regular intervals

    • Reveals accurate patterns of play

Research Aim

  1. Leverage rich behavioural data to analyse online gambling patterns

Identify distinctive behaviours through clustering techniques

  1. Contextualise findings using neighbourhood-level indicators

Reflect the social and physical living conditions and lifestyle factors influencing these patterns

Data Access &
Feature Curation

Data Access

  • Unique autonomous access

  • Certain data tables were gatekept

  • Research was conducted within a secure environment

  • A quarterly data safety training was mandated

  • Data were securely destroyed after time-limited use

Data Description

  • Study period: 1st January 2022 to 31st December 2022

  • Geography: England, Wales and Scotland (i.e., Great Britain)

1,184,905

‘Genuine’ customers across GB in 2022

~ 3 billion transaction records

1. Account Registration Info

Fictional illustrative sample of account registration information table.

2. Daily Account Balance (~33 million)

Fictional illustrative sample of the curated daily account balance table.

3. Bet Slips (~52 million)

Fictional illustrative sample of the curated bet slip transaction table.

4. Gaming Sessions (~64 million)

Fictional illustrative sample of the curated gaming session transaction table.

37 Behavioural Features

Frequency (7)

  • Average gap between deposit-days

  • Withdrawal-deposit ratio

  • Prop. of gambling-active days

Intensity (11)

  • Average stake amount per gambling-days

  • Average monthly loss amount

  • Average session duration

Riskiness (5)

  • Prop. of loss-days

  • Average potential return per bet

  • Prop. of acca bets

Variability (14)

  • SD gap between deposit-days

  • Prop. of popular-day plays

  • Total no. of activities bet

Caveat

  • Data from a single operator

  • Data does not include online lottery participation

Clustering

Methods

  • Segmented into active and dormant gamblers

  • Active gamblers further split based on their engagement with different products

  • Clustering conducted separately for each of the three active gambler groups

%%{init: {'flowchart': {'nodeSpacing': 20, 'rankSpacing': 10}}}%%
flowchart LR
    A[("Genuine gamblers")]  --> act["Active gamblers (67.2%)"]
    A --> F["Dormant gamblers (32.8%)"]
    
    subgraph Clustering[ ]
        direction LR
        subgraph id1["Participated in both (BG)"]
            direction TB
            BG["Group BG (36%)<br>37 features"] --> tr1([IHS transformation<br>& min-max scaling]) --> I([PCA]) --> J([K-means]) --> Q["Subgroups BG (6)"  ]
        end
        subgraph id2["Betting-exclusive players (B)"]
            direction TB
            B["Group B (14.1%)<br>28 features"] --> tr2([IHS transformation<br>& min-max scaling]) --> L([PCA]) --> M([K-means]) --> R["Subgroups B (2)"]
        end
        subgraph id3["Gaming-exclusive players (G)"]
            direction TB
            G["Group G (17.1%)<br>27 features"] --> tr3([IHS transformation<br>& min-max scaling]) --> O([PCA]) --> P([K-means]) --> S["Subgroups G (3)"]
        end
    end
    
    act --> BG
    act --> B
    act --> G

    %% Styling
    classDef default font-size:18px,font-family:Serif;
    classDef header fill:#2c3e50,color:white,stroke:none;
    classDef group fill:#ecf0f1,stroke:#bdc3c7,stroke-width:2px;
    classDef method fill:#d5e8d4,stroke:#82b366,stroke-width:1px;
    classDef output fill:#ffe6cc,stroke:#d79b00,stroke-width:4px;
    classDef cluster fill:#f8f9fa,stroke:#95a5a6,stroke-width:1px,stroke-dasharray:5 5,font-size:13px,font-weight:bold;

    class A header;
    class act,BG,B,G group;
    class I,J,L,M,O,P,tr1,tr2,tr3,sc1,sc2,sc3 method;
    class Q,R,S,F output;
    class Clustering cluster;

Results

6 clusters from Group BG

2 clusters from Group B

3 clusters from Group G

Hierarichical representation of 12 gambler subgroups within active and dormant gamblers.

Cluster Names

Cluster names of the online gambler classification.

Case Study:

Patterns of Play

Range plots characterising the five subgroups within Groups B and G by 37 features across four domains, relative to the supergroup average. The vertical dotted lines indicate the supergroup average.

Activities Gambled

Proportion of committed customers for each activity within the subgroup, with minimum and maximum values for each row highlighted in green and pink, respectively.

Days of Participation

Calendar heat-maps illustrating the distribution of customer participation levels across days of the year, by Subgroup.

Age-Gender Distributions

Population pyramids illustrating the age-gender distribution of customers in each subgroup, relative to the adult population. Vertical black line = IS of active gambler benchmark; grey error bar = 95% CI for each calculated IS.

Geodemographic Profiling 1

Bar charts illustrating the representation of betting and gaming across 2021 OAC Groups, relative to the adult population of GB. score 100. Black dots denote ISs of all active gamblers in the Group. Error bars indicate 95% confidence intervals.

Geodemographic Profiling 2

2019 Harmonised IMD deciles

Novelty & Implications

Questions?